Skip to content

perf: specialize Builtin1 and Builtin3 apply paths#807

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/builtin-apply-overrides
Apr 30, 2026
Merged

perf: specialize Builtin1 and Builtin3 apply paths#807
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:perf/builtin-apply-overrides

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 30, 2026

Split out from #763.

Motivation:

Reduce allocation and dispatch overhead when one- and three-argument builtins are called through the dynamic Val.Func.apply1 / apply3 path.

Key Design Decision:

Keep the optimization local and semantics-preserving. Builtin2 already has an exact-arity apply2 override; this adds matching Builtin1.apply1 and Builtin3.apply3 overrides. Exact positional calls directly invoke the structured evalRhs overload and skip constructing an intermediate Array. Non-exact paths still fall back to the generic parent application path.

Correctness:

  • The direct path matches the existing Builtin1.apply / Builtin3.apply exact positional behavior: force the supplied Eval values, then call the typed evalRhs.
  • Named arguments, missing defaults, too many arguments, and other non-exact calls still use the generic function application logic.
  • Static Expr.ApplyBuiltin1 / Expr.ApplyBuiltin3 paths are unchanged; this only helps dynamic builtin calls such as a builtin stored in a local or returned from another function.

Modification:

  • Add Builtin1.apply1.
  • Add Builtin3.apply3.

Validation:

  • ./mill --no-server 'sjsonnet.jvm[3.3.7].compile'
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].test' (141/141, SUCCESS)
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
  • ./mill --no-server '_.jvm[_].__.test' (1104/1104, SUCCESS)
  • Dynamic builtin smoke checks:
    • local f = std.length; f([1, 2, 3]) -> 3
    • local f = std.substr; f("abcdef", 1, 3) -> "bcd"
    • local f = std.substr; f(str="abcdef", from=1, len=3) -> "bcd"

Hyperfine:

Toolchain:

  • hyperfine 1.20.0
  • --warmup 3 --min-runs 25 for targeted dynamic builtin benchmarks
  • --warmup 3 --min-runs 20 for realistic2
  • JVM assemblies built with ./mill --no-server show 'sjsonnet.jvm[3.3.7].assembly'
  • Base: upstream/master at c04fc804
  • Branch: 2067d8b5

Targeted Builtin1 dynamic call benchmark:

local identity(x) = x;
local len = identity(std.length);
std.foldl(
  function(acc, i) acc + len("abcdef"),
  std.range(1, 5000000),
  0
)
Command Mean [ms] Min [ms] Max [ms] Relative
master builtin1_dynamic 649.8 +/- 48.3 557.4 726.8 1.00
branch builtin1_dynamic 661.7 +/- 41.0 606.1 747.3 1.02 +/- 0.10

Result: statistically neutral in this hyperfine run.

Targeted Builtin3 dynamic call benchmark:

local identity(x) = x;
local substr = identity(std.substr);
std.foldl(
  function(acc, i) acc + std.length(substr("abcdef", 1, 3)),
  std.range(1, 3000000),
  0
)
Command Mean [ms] Min [ms] Max [ms] Relative
master builtin3_dynamic 742.5 +/- 156.1 594.1 1254.7 1.12 +/- 0.30
branch builtin3_dynamic 660.4 +/- 110.9 534.0 962.6 1.00

Result: branch was faster in this run, but variance is high.

End-to-end realistic2:

Command Mean [ms] Min [ms] Max [ms] Relative
master realistic2 544.9 +/- 95.3 414.1 706.5 1.27 +/- 0.27
branch realistic2 428.4 +/- 54.1 378.3 565.8 1.00

Result: branch was faster in this run; due JVM-startup and system noise, treat this as a non-regression signal rather than a guaranteed 1.27x speedup.

Motivation:
Reduce allocation and dispatch overhead when one- and three-argument builtins are called through the dynamic function apply path.

Modification:
Add Builtin1.apply1 and Builtin3.apply3 overrides that directly call their structured evalRhs methods for exact positional arity, matching the existing Builtin2.apply2 specialization and falling back to the generic parent path otherwise.

Result:
Dynamic builtin calls avoid constructing temporary argument arrays on the exact-arity path. JVM tests and targeted hyperfine comparisons pass.
@stephenamar-db stephenamar-db merged commit 192bb33 into databricks:master Apr 30, 2026
5 checks passed
He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request Apr 30, 2026
Motivation:
Reduce allocation overhead in common numeric rendering paths.

Modification:
1. RenderUtils.renderDouble reuses pre-cached string representations for exact integer doubles in the range 0-255.
2. Materializer.stringify delegates number stringification to RenderUtils.renderDouble, removing its duplicate integer fast path.

Result:
Numeric materialization uses the shared renderDouble fast path. The Builtin1.apply1 / Builtin3.apply3 specialization from the original PR is already present in current master via databricks#807, so it is no longer part of this PR diff.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants